-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate feature extractor networks for DQN networks #132
Conversation
Nice catch, but it was working before? I think we may have the issue with SAC/TD3 too then. |
I do not see how it could have worked before, unless somewhere along the line the models got cloned or the references to data they use were separated (not sure how that could happen, either...). As for SAC/TD3, I heard they share the feature extractor part but this is not updated as part of target-update, right? In DQN you update the whole network, and thus you have full copies of both. Edit: No wait, we might have the same issue, as it seems like whole networks are being updated. Those will require more careful testing to see what is right for SAC/TD3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice catch =)
Do you have an easy way of testing that? Otherwise, I will merge as soon CI passes.
For SAC/TD3, we will see later, as I would like to share the feature extractor between actor and critic (otherwise it is super slow, and even when sharing, it is slow)
If the extractor isn’t flatten, then using the “is” operator with the .parameters() function should catch it.
… On 30 Jul 2020, at 21:36, Antonin RAFFIN ***@***.***> wrote:
@araffin approved this pull request.
LGTM, nice catch =)
Do you have an easy way of testing that? Otherwise, I will merge as soon CI passes.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Ok, I think I will delay the test to SAC/TD3 fix, as this is a hotfix for DQN. |
Closes #131
Description
DQN main and target network accidentally shared feature extractor network. This fix creates new feature extractor each time new QNetwork is made.
Motivation and Context
Types of changes
Checklist:
make format
(required)make check-codestyle
andmake lint
(required)make pytest
andmake type
both pass. (required)